The abnormal expression of FAK (focal adhesion kinase) and its high activity confirmed by staining of high-grade ovarian cancer patient tissue has been a reason for Schlaepfer lab to create ovarian cancer models – KMF (spontaneous Kras, Myc, and FAK genes gain) and Movcar (independent murine ovarian carcinoma). Here we use both KMF and MOVCAR model to identify the potential FAK-driven genes. We have utilized our KMF and Movcar models in virto and in vivo in mice, isolated total RNA and performed RNA seq in Novogene company. By using this data and bioinformatics pipeline here, we are aiming to find the candidate genes that are significant in both our models and for clinical prediction of disease course. Our findings will be used as potential drug treatment targets and further research focus.
Here I will the RNAseq data from both KMF and MOVCAR model to identify the potential FAK-driven genes, and then by comparing the gene data from ovarian patient tumor samples, if the candidate genes are significantly changed in all above conditions, there will be a chance this can be a potential therapeutic target. The analysis is designed with 3 steps. First by analyzing the RNAseq from Schlaepfer’s lab, we select the gene that have been significantly regulated in both KMF and MOVCAR models. Second, by analyzing the TCGA dataset we extract the genes that are differently expressed in FAK overexpressed (Ptk2 amplifications) ovarian cancer patients. By comparing with the FAK non-overexpressed ovarian cancer patients, we exclude the genes that are highly expressed in cancer patients with single copy of Ptk2 and leave in the ones that are unique to the FAK overexpressed patients. Third, the genes from step one will be filtered with patient data. The list of genes as the result of this analysis will be clinically relevant with FAK overexpressed ovarian cancer.
Pandas: Numpy: plotly: matplotlib:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import math
import matplotlib.pyplot as plt
volcano = pd.read_csv('KO_3DvsWT_3D_deg_down.csv')
volcano2 = pd.read_csv('KO_3DvsWT_3D_deg_up.csv')
result = pd.concat([volcano, volcano2], axis = 0).reset_index(drop = True)
result
| gene_id | KO_3D1 | KO_3D2 | KO_3D3 | WT_3D1 | WT_3D2 | WT_3D3 | KO_3D | WT_3D | log2FoldChange | ... | KO_3D3_count | WT_3D1_count | WT_3D2_count | WT_3D3_count | KO_3D1_fpkm | KO_3D2_fpkm | KO_3D3_fpkm | WT_3D1_fpkm | WT_3D2_fpkm | WT_3D3_fpkm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ENSMUSG00000020469 | 126.888898 | 103.821854 | 159.705072 | 3850.392314 | 4072.511185 | 2527.382847 | 130.138608 | 3483.428782 | -4.743917 | ... | 153 | 3862 | 3678 | 2480 | 12.663993 | 10.336175 | 16.187885 | 384.791223 | 404.201159 | 250.505362 |
| 1 | ENSMUSG00000029675 | 8074.671638 | 6383.121396 | 7583.381353 | 33228.825850 | 35640.562810 | 33312.536490 | 7347.058129 | 34060.641720 | -2.212834 | ... | 7265 | 33329 | 32188 | 32688 | 58.714607 | 46.299725 | 56.002645 | 241.940966 | 257.723524 | 240.562503 |
| 2 | ENSMUSG00000041220 | 367.221512 | 388.370639 | 372.645168 | 1436.668908 | 1514.734992 | 1566.365902 | 376.079106 | 1505.923267 | -2.001648 | ... | 357 | 1441 | 1368 | 1537 | 3.362394 | 3.547243 | 3.465297 | 13.171959 | 13.792577 | 14.243369 |
| 3 | ENSMUSG00000040170 | 7.562914 | 12.497075 | 10.438240 | 304.083287 | 293.424542 | 161.018746 | 10.166076 | 252.842192 | -4.646132 | ... | 10 | 305 | 265 | 158 | 0.079397 | 0.130872 | 0.111292 | 3.196534 | 3.063362 | 1.678762 |
| 4 | ENSMUSG00000040488 | 1962.156132 | 1996.648063 | 1747.361374 | 6389.737011 | 7796.234707 | 8360.745515 | 1902.055190 | 7515.572411 | -1.982193 | ... | 1674 | 6409 | 7041 | 8204 | 17.877158 | 18.146398 | 16.168584 | 58.293619 | 70.637941 | 75.649986 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1160 | ENSMUSG00000004791 | 45.377487 | 28.839404 | 42.796784 | 10.966938 | 22.145248 | 9.171954 | 39.004558 | 14.094714 | 1.477758 | ... | 41 | 11 | 20 | 9 | 0.920434 | 0.583529 | 0.881632 | 0.222746 | 0.446705 | 0.184762 |
| 1161 | ENSMUSG00000079559 | 16.806477 | 9.613135 | 9.394416 | 0.000000 | 2.214525 | 2.038212 | 11.938009 | 1.417579 | 3.099336 | ... | 9 | 0 | 2 | 2 | 0.774364 | 0.441832 | 0.439605 | 0.000000 | 0.101470 | 0.093265 |
| 1162 | ENSMUSG00000113780 | 2.520971 | 10.574448 | 5.219120 | 0.000000 | 0.000000 | 0.000000 | 6.104847 | 0.000000 | 4.992515 | ... | 5 | 0 | 0 | 0 | 0.063520 | 0.265780 | 0.133556 | 0.000000 | 0.000000 | 0.000000 |
| 1163 | ENSMUSG00000025175 | 42.016191 | 69.214569 | 53.235024 | 26.918848 | 23.252511 | 4.076424 | 54.821928 | 18.082594 | 1.600543 | ... | 51 | 27 | 21 | 4 | 0.562421 | 0.924201 | 0.723714 | 0.360807 | 0.309530 | 0.054191 |
| 1164 | ENSMUSG00000053965 | 263.861681 | 288.394039 | 99.163280 | 123.627304 | 108.511717 | 71.337419 | 217.139667 | 101.158813 | 1.103457 | ... | 95 | 124 | 98 | 70 | 1.580439 | 1.723105 | 0.603221 | 0.741462 | 0.646348 | 0.424344 |
1165 rows × 33 columns
result['gene_name'] = result['gene_name'].str.upper()
result.set_index('gene_name')
| gene_id | KO_3D1 | KO_3D2 | KO_3D3 | WT_3D1 | WT_3D2 | WT_3D3 | KO_3D | WT_3D | log2FoldChange | ... | KO_3D3_count | WT_3D1_count | WT_3D2_count | WT_3D3_count | KO_3D1_fpkm | KO_3D2_fpkm | KO_3D3_fpkm | WT_3D1_fpkm | WT_3D2_fpkm | WT_3D3_fpkm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gene_name | |||||||||||||||||||||
| MYL7 | ENSMUSG00000020469 | 126.888898 | 103.821854 | 159.705072 | 3850.392314 | 4072.511185 | 2527.382847 | 130.138608 | 3483.428782 | -4.743917 | ... | 153 | 3862 | 3678 | 2480 | 12.663993 | 10.336175 | 16.187885 | 384.791223 | 404.201159 | 250.505362 |
| ELN | ENSMUSG00000029675 | 8074.671638 | 6383.121396 | 7583.381353 | 33228.825850 | 35640.562810 | 33312.536490 | 7347.058129 | 34060.641720 | -2.212834 | ... | 7265 | 33329 | 32188 | 32688 | 58.714607 | 46.299725 | 56.002645 | 241.940966 | 257.723524 | 240.562503 |
| ELOVL6 | ENSMUSG00000041220 | 367.221512 | 388.370639 | 372.645168 | 1436.668908 | 1514.734992 | 1566.365902 | 376.079106 | 1505.923267 | -2.001648 | ... | 357 | 1441 | 1368 | 1537 | 3.362394 | 3.547243 | 3.465297 | 13.171959 | 13.792577 | 14.243369 |
| FMO2 | ENSMUSG00000040170 | 7.562914 | 12.497075 | 10.438240 | 304.083287 | 293.424542 | 161.018746 | 10.166076 | 252.842192 | -4.646132 | ... | 10 | 305 | 265 | 158 | 0.079397 | 0.130872 | 0.111292 | 3.196534 | 3.063362 | 1.678762 |
| LTBP4 | ENSMUSG00000040488 | 1962.156132 | 1996.648063 | 1747.361374 | 6389.737011 | 7796.234707 | 8360.745515 | 1902.055190 | 7515.572411 | -1.982193 | ... | 1674 | 6409 | 7041 | 8204 | 17.877158 | 18.146398 | 16.168584 | 58.293619 | 70.637941 | 75.649986 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| PGF | ENSMUSG00000004791 | 45.377487 | 28.839404 | 42.796784 | 10.966938 | 22.145248 | 9.171954 | 39.004558 | 14.094714 | 1.477758 | ... | 41 | 11 | 20 | 9 | 0.920434 | 0.583529 | 0.881632 | 0.222746 | 0.446705 | 0.184762 |
| COLCA2 | ENSMUSG00000079559 | 16.806477 | 9.613135 | 9.394416 | 0.000000 | 2.214525 | 2.038212 | 11.938009 | 1.417579 | 3.099336 | ... | 9 | 0 | 2 | 2 | 0.774364 | 0.441832 | 0.439605 | 0.000000 | 0.101470 | 0.093265 |
| GM33195 | ENSMUSG00000113780 | 2.520971 | 10.574448 | 5.219120 | 0.000000 | 0.000000 | 0.000000 | 6.104847 | 0.000000 | 4.992515 | ... | 5 | 0 | 0 | 0 | 0.063520 | 0.265780 | 0.133556 | 0.000000 | 0.000000 | 0.000000 |
| FN3K | ENSMUSG00000025175 | 42.016191 | 69.214569 | 53.235024 | 26.918848 | 23.252511 | 4.076424 | 54.821928 | 18.082594 | 1.600543 | ... | 51 | 27 | 21 | 4 | 0.562421 | 0.924201 | 0.723714 | 0.360807 | 0.309530 | 0.054191 |
| PDE5A | ENSMUSG00000053965 | 263.861681 | 288.394039 | 99.163280 | 123.627304 | 108.511717 | 71.337419 | 217.139667 | 101.158813 | 1.103457 | ... | 95 | 124 | 98 | 70 | 1.580439 | 1.723105 | 0.603221 | 0.741462 | 0.646348 | 0.424344 |
1165 rows × 32 columns
result['pvalue'] = - np.log10(result['pvalue'])
significant_gene = pd.read_csv('Genes_KMF_Mov_Joanna.csv')
significant_gene['Genes'] = significant_gene['Genes'].str.upper()
result['test'] = result['gene_name'].isin(significant_gene['Genes'])
result['test'] = result['test'].map({True: 'True', False: 'False'})
fig = go.Figure()
colorIdx = {'True' : 'rgb(215,48,39)', 'False' : 'rgb(39, 77, 215)'}
cols = result['test'].map(colorIdx)
trace1 = go.Scatter(
x = result['log2FoldChange'],
y = result['pvalue'],
textposition="top center",
mode = 'markers +text',
opacity=0.5,
marker = dict(size = 7, color = cols),
)
fig.add_trace(trace1)
fig.show()
Overall some of the genes in our model seems are significantly regulated in the tumor samples. However, the genes that are significant in KMF model doesn't seem overlap with the MOVCAR model a lot. I still need to work on analyzing the clinical data